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Introduction 



What is an authentic digital object? On January 24, 2000, the Council 
on Library and Information Resources (CLIR) convened a group of 
experts from different domains of the information resources commu- 
nity to address this question. To prepare for a fruitful discussion, we 
asked five individuals to write position papers that identify the at- 
tributes that define authentic digital data over time. These papers, 
together with a brief reflection on the major outcomes of the work- 
shop, are presented here. 

Our goal for this project was modest: to begin a discussion 
among different communities that have a stake in the authenticity of 
digital information. Less modestly, we also hoped to create a com- 
mon understanding of key concepts surrounding authenticity and of 
the terms various communities use to articulate them. 

"Authenticity" in recorded information connotes precise, yet dis- 
parate, things in different contexts and communities. It can mean be- 
ing original but also being faithful to an original; it can mean uncor- 
rupted but also of clear and known provenance, "corrupt" or not. 

The word has specific meaning to an archivist and equally specific 
but different meaning to a rare book librarian, just as there are differ- 
ent criteria for assessing authenticity for published and unpublished 
materials. In each context, however, the concept of authenticity has 
profound implications for the task of cataloging and describing an 
item. It has equally profound ramifications for preservation by set- 
ting the parameters of what is preserved and, consequently, by what 
technique or series of techniques. 

Behind any definition of authenticity lie assumptions about the 
meaning and significance of content, fixity, consistency of reference, 
provenance, and context. The complexities of these concepts and 
their consequences for digital objects were explored in Preserving 
Digital Information: Report of the Task Force on Archiving of Digital Infor- 
mation, published by the Commission on Preservation and Access in 
1996. There is no universally agreed-upon mandate about what must 
be preserved and for what purpose. For example, an archivist will 
emphasize the specifications of a record that bears evidence; a librar- 
ian will focus on the content, knowing that it could serve multiple 
purposes over time. That being the case, there may be many ways to 
describe an item being preserved and what aspects of that item must 
be documented to ensure its authenticity and its ability to serve its 
intended use over time. For certain purposes, some argue, migration 
may suit the preservation needs of a digital object. For those objects 
most valued as executable programs, others argue, emulation is pref- 
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erable. Beyond the technical options undergirding metadata and 
preservation decisions, numerous nontechnical questions beg to be 
asked. The issue of authenticity must be resolved before humanists 
and scientists can feel confident in creating and relying upon digital 
information. 

Creating a common understanding about the multiple meanings 
and significance of authenticity is critical in the digital environment, 
in which information resources exist in many formats yet are interac- 
tive. From peer-reviewed journal articles to unpublished e-mail cor- 
respondence, these resources are integrated; they can interact and be 
modified in a networked environment. We wanted to know whether 
the distinctions that have proved to be helpful heuristic devices in 
the analog world, such as edition or version, document or record, 
could help us define a discrete piece of digital information. Can we 
define the distinct attributes of an information resource that would 
set the parameters for preservation and mandate specific metadata 
elements, among other important criteria? 

We charged the five writers — an archivist, a digital library ex- 
pert, a documentary editor and special collections librarian, an ex- 
pert on document theory, and a computer scientist — to address one 
essential question: What is an authentic digital object and what are 
the core attributes that, if missing, would render the object some- 
thing other than what it purports to be? We asked each to address 
this question from the perspective he found most congenial. We em- 
phasized our interest in the essential elements that define a digital 
object and guarantee its integrity, but left the writers free to grapple 
with that question as they saw fit. 

In considering this central issue, we asked that they think about 
the following: 

• If all information — textual, numeric, audio, and visual — exists as 
a bit stream, what does that imply for the concept of format and 
its role as an attribute essential to the object? 

• Does the concept of an original have meaning in the digital envi- 
ronment? 

• What role does provenance play in establishing the authenticity 
of a digital object? 

• What implications for authenticity, if any, are there in the fact 
that digital objects are contingent on software, hardware, net- 
work, and other dependencies? 

These are some of the issues that we anticipated would arise in 
the course of the workshop. 

In thinking of which communities to include in the workshop 
discussion, CLIR sought expertise from the major stakeholders in 
these issues: librarians, archivists, publishers, document historians, 
technologists, humanists, and social scientists. Because so many con- 
cepts of authenticity derive directly from experience with analog in- 
formation, we called upon experts in the traditional technologies, 
such as printing and film, to elucidate key concepts and techniques 
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for defining and securing authenticity of information bound to a 
physical medium. 

The authors were given time to revise their papers in light of the 
discussion and any comments they received from the participants. 
Some chose to revise their papers, and others did not. The task of 
writing a position paper on this complex subject (a paper that we 
limited in size but not scope) was quite difficult. Each writer took a 
different approach to the subject, and the papers differ greatly one 
from another. This seeming disparity proved a boon to the discus- 
sions. During that time, each writer had a chance to "unpack" the 
various nuances of thought that the papers held in short form only, 
and participants were confronted with the diverse ways that such 
common words as copy, original reliable , or object are used. Much of 
the substance of the discussion is included in the concluding essay. 

As one participant remarked, authenticity is a subject we have 
avoided talking about, primarily because the issues it raises appear 
so intractable. We are deeply grateful to Messrs. Cullen, Hirtle, Levy, 
Lynch, and Rothenberg for agreeing to form the advance party as we 
ventured into terra incognita. They were willing not only to think 
deeply about a vexing issue but also to commit their thoughts to 
writing and to careful scrutiny by others. Their papers, together with 
the oral summaries they delivered at the meeting, marked out sever- 
al different trails to follow, each of which opened onto ever-larger 
vistas — some breathtaking, some daunting. We are also grateful to 
the participants, many of whom came from very distant places. Their 
thoughtful preparation and frank discussion confirmed our sense 
that authenticity is important for many communities, and that they 
are ready to engage the issue. 



Abby Smith 
Director of Programs 
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Authentication of Digital Objects: 
Lessons from a Historian’s Research 

by Charles T. Cullen 



T he issues stemming from authenticating digital objects are 

quite similar, and in some cases identical, to those relating to 
holographs or printed books. Everyone dealing with impor- 
tant material in any form should approach it with a bit of skepticism, 
but scholars especially need to question what it is they are using. In 
other words, they need to authenticate all documentation they use in 
the processes of learning and of creating new scholarship. An au- 
thentic object is one whose integrity is intact — one that is and can be 
proven or accepted to be what its owners say it is. It matters little 
whether the object is handwritten, printed, or in digital form. 

Over time, we have established various measures of authenticity 
for analog forms that we trust almost without question. Our trust is, 
however, much greater for printed books than for handwritten ob- 
jects. In fact, handwritten objects raise many of the same questions of 
authenticity as digital objects do. The difference is that in the case of 
the former, the answers may be more easily found. Take Thomas Jef- 
ferson's manuscript "Report on the Navigation of the Mississippi," 
for example. Could he have written it? Is it his handwriting? Is the 
paper watermarked, and from the appropriate time period? Is the 
ink contemporary? Do other copies of the manuscript exist? Has its 
recipient or any other contemporary endorsed it? Is there other inter- 
nal evidence? Who has described it for us? Has it been identified by 
a trusted third party? 

Is a book authentic? Who published it, and who wrote it? Can 
they be trusted (are they worthy of one's research time)? Is the rare 
book what it purports to be? Is the manuscript correspondence actu- 
ally by the person to whom it is attributed, and is its date accurate? 
These questions are now being asked more openly of objects that 
originate in digital form because we have not yet adopted practices 
or standards for providing ready answers to them. When objects are 
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presented digitally, deciding what is required to authenticate them 
may be informed from past practices with non-digital objects. 

Two experiences with paper objects inform my views of this sub- 
ject. The first is a multi-page autograph document that lies in the 
John Marshall Papers at the Virginia State Library. It is labeled in the 
hand that wrote the entire piece, "John Marshall's Notes on Evidence 
in Commonwealth v. Randolph, 1796." Although the title itself might 
raise some question about who penned it (How often does an au- 
thor — even an eighteenth-century author — use his own name in a 
title of one of his documents?), this document has been used for de- 
cades for the source of historical articles and at least one full-length 
book on the investigation of Richard Randolph for murder. Randol- 
ph, a member of the famed Randolph family of Virginia, was related 
to Marshall and to Thomas Jefferson and to many other members of 
Virginia's "first families." 

Examination of the writing by those familiar with John Mar- 
shall's hand, however, quickly reveals that he did not pen this docu- 
ment. Knowing who did write it is important, but does not help 
make it more authentic as a Marshall document. The possibility that 
someone in possession of Marshall's holograph could have copied 
the document raises new questions, not the least of which assigns 
significant importance to the value of the original of a document, re- 
gardless of its form. Internal evidence, obtained by a close reading of 
this document, reveals that it might be a partial transcript of a hear- 
ing in Cumberland County Court where witnesses are questioned by 
attorneys, and it has been used by historians as a partial record of 
Randolph's "trial." But Marshall's name never appears as one of the 
questioners, and a knowledge of Virginia law at the time would re- 
veal that whatever was taking place could not be an actual trial, be- 
cause white men could not be tried for felonies at the county court 
level during that period. In short, efforts to authenticate this docu- 
ment raise more questions than they answer. At the least, such efforts 
reveal that the document may not be what many had long thought it 
to be, and that it may not be even what its title says it is. 

This example is somewhat esoteric, to be sure, because it is un- 
likely that one would use a similar digital document without asking 
the questions that eventually were asked of the document attributed 
to Marshall. But the questions asked of it suggest attributes that must 
be held by a holograph as well as by a digital object that is to be re- 
garded as authentic. Is it the author's work or a copy? Is it what its 
title purports it to be? What tests can be applied to answer these 
questions convincingly? 

A second example is more to the point. In the collection of Tho- 
mas Jefferson's papers at the Library of Congress is a document that 
appears to be a list of letters written and received between 1791 and 
1793, a period of time during which Jefferson was Secretary of State. 
An examination of the handwriting reveals that it is most likely Jef- 
ferson's, but the list is unlike his other journals of letters sent and re- 
ceived. A close look at the original document suggests that it was 
written in only one or two sittings (the ink changes only once or 
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twice), rather than over three years. The most significant evidence 
relating to this document's authenticity lies in the paper itself. Hold- 
ing it before a light source reveals a watermark that indicates the pa- 
per was manufactured in 1804. The document, therefore, could not 
be an authentic 1791-1793 document. 

Almost all these tests can be applied to digital objects, and they 
need to be. But because digital objects bear less evidence of author- 
ship, provenance, originality, and other commonly accepted at- 
tributes than do analog objects, the former are subject to additional 
suspicion. Tests must be devised and administered to authenticate 
them. 

In many cases, problems of authentication arising from objects 
that originate as digits are obvious. In trying to find solutions to 
those problems, however, we must carefully test all suggestions to 
ensure that they do not themselves open new issues that may be in- 
herent in this medium. The problems of preserving digital objects 
have received more attention than have questions of authentication 
(people, I suppose, are less worried about authenticity than about 
preservation). But why preserve what is not authentic? Might the 
preservation of a digital object imply an endorsement of authenticity, 
even if nothing else is done to it? More than one archivist has stated 
that the only sure means of preserving a digital object is to save a 
printed copy. Concerns with format codes, migrations from version 
to version, dependence on hardware — would all be solved by print- 
ing a copy (or many copies) and putting it (them) in a safe place. Do 
that to a digital object before confronting the questions of authentici- 
ty, and all that is valuable may be lost. Converting a digital object 
from one program to another, or migrating it from version to version, 
could present problems of authenticity that may or may not be 
solved by careful attention to provenance. 

A digital object must be authenticated at the time of its creation 
by a means that will convey a high degree of confidence to all users, 
including subsequent use by the originator. Clifford Lynch wrote an 
interesting and convincing article on the integrity of digital informa- 
tion, published in the December 1994 issue of the Journal of the Ameri- 
can Society of Information Science. He seems to assume, from traditional 
experience perhaps, that readers will be responsible for authenticat- 
ing copies being used on the basis of cataloging data to which they 
must be alert. Retrieving electronic files by title, for example, might 
lead one to a revised work, different from the original. The reader 
must exercise caution. Lynch writes, and be ready to detect signs of 
alteration. "The expectation should be that violations of integrity 
cannot be trivially accomplished," he says. Accepting this in the 
world of printed objects is relatively easy. It is much more difficult in 
the realm of electronic digital information. 

Andy Hopper of Cambridge University suggests an authentica- 
tion strategy that is worthy of consideration, if not adoption. In his 
system, the concept of a trusted third party is borrowed from the 
print world. According to this concept, trusted librarians help au- 
thenticate their print holdings through recognized acquisition pro- 
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cesses, accepted cataloging procedures, and careful stewardship of 
their collections, especially those in manuscript form. If a special cob 
lection librarian tells us, either directly or by means of a catalog card, 
that the book in hand is one of two extant copies of Ariosto's Orlando 
Furioso printed on vellum in Venice in 1542, and that it was prepared 
for the dauphin of France, the library's and the librarian's reputation 
go a long way toward instilling some degree of confidence that the 
document is indeed authentic. Moreover, all of this information may 
be checked. If another librarian delivers to a reader a box of letters 
cataloged as Ernest Hemingway's, authentication is assumed until 
internal or physical evidence suggests someone has made a mistake. 
Knowing that the materials — hard-copy objects — have gone through 
a process of description and identification, if not authentication, con- 
veys a sense of trust that they are authentic, at least until proved oth- 
erwise. Some of the problems of description that help authenticate 
printed special collection objects have similar, if not identical, exam- 
ples in the digital world. Take one final example as evidence: in the 
Newberry Library's special collections is a printed copy of the classic 
book on rhetoric in Renaissance England, Arte ofRhetorique by Tho- 
mas Wilson (1525-1581). This particular copy is identified as having 
belonged to Elizabeth I as part of her royal library, and it is authenti- 
cated as such by its original binding, which bears the mark of the 
royal arms. Book historians know that until the time of James I, the 
royal arms were put only on the books within the monarch's own 
library. (After 1603, King James allowed them to be placed on books 
bound for other members of court.) Elizabeth's coat of arms on the 
binding of this copy of Wilson's Rhetorique therefore marks it as au- 
thentic, as long as external evidence does not dispute it. (If it could 
be shown that the binding was not sixteenth century, for example, or 
that it resembled the work of a sixteenth-century forger, the authen- 
ticity might be questioned). 

Some accepted system of similar assumption of authentication 
needs to exist in the digital world, but it is more difficult to achieve 
because digital material is more changeable, accidentally or deliber- 
ately. Andy Hopper and others suggest that some means of marking 
digital objects could help solve many of the problems of authentica- 
tion. Hopper argues that libraries might serve as authenticators by 
marking digital objects by some means that would remove doubt as 
to their characteristics at time of origin. A method must be developed 
whereby a trusted third party, ideally a trusted librarian, would put a 
marker on a digital document — a marker that could not be predicted 
or devised (guessed) — that would mark the document's time and 
date. The marker might be a number based on sonic rays at various 
times during the day, a number large enough to prevent guessing 
(Hopper suggests 100 digits). A professor writing a paper could send 
the document to the librarian to be marked, and it could then be re- 
turned to and held by the author. In the future, the object could be 
authenticated by its marker, regardless of who held it. Any change in 
the document would remove the marker. This procedure would be 
used by librarians who receive digital objects from donors. The 
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marker would ensure that digital objects are as authentic as analog 
objects at time of cataloging. 

Despite its science-fiction flavor, such a method seems to meet 
accepted tests of authenticity. A trusted third party can claim nothing 
more about an object, analog or digital, than what can be cataloged, 
and that information derives largely from physical evidence. Identi- 
fying an object in a catalog record or a collection description puts a 
marker on it that most of us use as the first step in the process of au- 
thentication. Is the document what it purports to be or what its own- 
er claims it to be? Scholars often require means to test the cataloger, 
and the physical attributes of analog objects offer more opportunities 
to do such testing than do those of digital objects. Handwriting, pub- 
lishing history, bindings, watermarks, inks, and various forms of in- 
ternal evidence provide answers to questions of authenticity in ana- 
log objects that are lacking in digital objects. Digital objects have 
attributes that can be used to help with authentication, but none is 
sufficiently trustworthy or stable to be acceptable unless a workable 
system of certain marking can be devised. 

Certifying that a digital object is the product of its author is diffi- 
cult when the object originates in electronic form. Without a deliber- 
ate and distinctive marking caused by the author that could not be 
guessed by another or altered by anyone, it seems impossible to au- 
thenticate an electronic document beyond doubt. Authors of files or 
images must take steps to establish authorship of their work; if not, 
our only option is to accept the assertions of others. Electronic files 
left behind by someone who has not taken action to establish author- 
ship are subject to suspicion if authorship is asserted by anyone else 
at the time of "cataloging." This leaves us where we have been all 
along — at the mercy of catalogers. But, in the case of a digital object, 
we are actually worse off than we would be if we were dealing with 
an analog object. This is because we lack the physical evidence pro- 
vided by analog objects — evidence that offers the means to test the 
cataloger. This ability to test both reassures the user and helps keep 
the cataloger honest. I find no corollary in the digital object realm. 

The concern over authenticating digital copies of analog objects 
is almost as important as that relating to objects that originate in dig- 
ital form. Scholars are keenly interested in having access to docu- 
mentary evidence in digital form, and librarians have begun to con- 
sider digitization a desirable means of preservation, in spite of the 
recognized problems inherent in it. Those who hope to use this mate- 
rial, once it has been digitized, must be able to rely on its authentici- 
ty, just as they have become accustomed to do in all the forms cur- 
rently available. Documentary editors, as well as librarians, have 
new responsibilities as they publish and provide access to their ma- 
terials in digital form with all the value they have added intact. The 
work of documentary editors offers some insight into the questions 
raised over authenticity of digital objects, especially those that derive 
from analog or holographic objects. 

The first task of a documentary editor who is working on an edi- 
tion of a subject's papers is to locate all the objects that have ever ex- 
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isted as part of the corpus, incoming as well as outgoing. This some- 
times requires reliance on copies of papers that evidence suggests 
once existed in original form but which have not been found. Once 
the collection is organized, each item must be dealt with separately. 
That is the first stage for authentication tests, starting with the ques- 
tion of whether the item is what it appears to be. All the available 
physical attributes assist in answering these questions, but some- 
times only internal evidence leads to a final answer (as in the Mar- 
shall and Jefferson examples described earlier). The editor is obliged 
to share these findings with readers and to describe the item in such 
a way that few, if any, questions remain about the document as ob- 
ject. Not unimportant in this description is all available information 
about other copies of an item, be they photocopies, carbons, letter- 
press, polygraph, drafts, or additional holographs. Knowing as much 
as the editor about all copies is the only sure way for other readers to 
test the "cataloger's" description, and only by having this informa- 
tion available can a reader have full confidence that all questions of 
authenticity have been asked. In preparing digital files of historic 
documents, editors begin their publication by attaching a full docu- 
ment description to a transcript. This is the scholar's seal of authen- 
ticity, as it were, or at least as much of a seal as a scholarly editor can 
provide. 

Preparing a digital transcript of a historic object introduces new 
problems to the issue of authentication. How do we know the tran- 
scription is accurate and that it is exactly what the editor prepared 
originally? The method of providing access to journal articles adopt- 
ed by JSTOR may offer the best answer for authenticating modern 
digital transcripts of manuscripts or printed material that originated 
in analog form. They provide the user with a digital transcription of 
the text, which is fully searchable and otherwise subject to all the va- 
garies of digital files. They also provide an image of the original text. 
If both copies could carry some form of marking that could not be 
manipulated, the problem of authentication would be solved. This 
system should work quite well for documentary editors and the 
readers of their digital publications. Providing an image of the docu- 
ment that is transcribed would be an important improvement over 
present forms of presentation, because it would permit easy verifica- 
tion of transcriptions. Inaccurate transcriptions are the downfall of 
documentary editors (as they should be), and mistakes often go un- 
detected. The reader, who may have a high level of confidence in the 
scholarly work of the editor, is left to assume that the transcription is 
accurate and authentic. Having a means of testing this assumption 
would be a great improvement. 

Related problems that arise from considerations of authenticity 
seem to offer little to assist us in answering the primary question. 
Creating a digital file, and even marking it in such a way that will 
ensure authenticating it as my own, will mean little if the file itself 
cannot be read at any point in the future. If the file cannot be read, it 
cannot be authenticated as mine. (It would be even more maddening 
if the file could be authenticated but not read.) The same can be said 



Authentication of Digital Objects 



7 



for provenance. If a file can be marked in such a way that its authen- 
ticity is assured, issues of its subsequent provenance might not mat- 
ter in questioning its authenticity. But if a file cannot be read, its 
provenance will mean little, even if it can be tracked over a long peri- 
od. Without a marker of authenticity, provenance of a digital object 
would be of limited use in establishing authenticity. It would help 
test the cataloger, but the current technology would render uncertain 
any assertions of authenticity. The instability of software alone 
would introduce questions that would challenge any claims of au- 
thenticity suggested by a trusted provenance. 

Paul Conway (1999) says the existence of digital objects moves 
challenges of preservation from guaranteeing the physical integrity 
of objects to assuring their intellectual integrity, including their au- 
thenticity. He adds that librarians can control this by "authenticating 
access procedures and documenting successive modifications" to 
digital files. Authenticating access procedures may affect provenance 
more than the integrity of the digital object itself, but it would be dif- 
ficult to guarantee authentication with only this control. It seems 
that, in this argument, the alteration of an original record is accept- 
able as long as it is documented. Acceptance of changes with docu- 
mentation is unreasonable over time and places unnecessary bur- 
dens on users. In this case, as in others, preservation without 
authentication results in a loss of intellectual integrity. 

We are not close to having a means of marking digital docu- 
ments that cannot be challenged — a means that would establish au- 
thenticity. Absent such a technique, we are left to consider what oth- 
er attributes, if any, might approach the establishment of 
authenticity. Few suggest any high degree of confidence that would 
come close to what we have for analog materials, but consideration 
of the problem raises some issues that relate to other concepts that 
bear on the problem. How confident can one be when an object 
whose authentication is crucial depends on electricity for its exist- 
ence? Surely there are higher degrees of confidence in some cases 
than in others, but something more than provenance or traditional 
testing methods established for analog objects is needed. I believe it 
is easier to describe the characteristics of an authentic digital object 
than to support the authentication beyond a reasonable doubt. My 
definition is conditional; it depends on an object's capability of being 
proved to be authentic. Establishing a method of authentication of 
digital objects that would be unconditional may be possible. At the 
least, we must agree on some means of testing the authentication of 
digital objects. The consequences of not doing so are dire. 
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Archival Authenticity in a Digital Age 

by Peter B. Hirtle 



Archival Authenticity: An Example 

D owntown Baltimore is a vibrant, dynamic place filled with 
new office towers and hotels that rise above shops, plazas, 
and museums. At the heart of Baltimore is the Inner Harbor, 
an area that is crowded year-round with residents and tourists who 
are sightseeing, dining, shopping, or watching baseball at nearby 
Camden Yards. Over the past two decades, the Inner Harbor has be- 
come the living center of a revitalized downtown. 

The defining feature of Baltimore's Inner Harbor, unlike that of 
so many American cities, is not a glass structure, a shining space nee- 
dle, or a distinctive sculpture. The Harbor is marked instead by the 
sturdy masts and graceful spars of the USS Constellation , a historic 
wooden-hulled naval vessel permanently moored there. 

The famous ship arrived in Baltimore in 1955, and for the next 35 
years, the city celebrated its frigate, taking pride in the illustrious 
history of a ship that had been built in Baltimore in 1797 as a sister 
ship to the equally famous USS Constitution anchored in Boston. The 
story of the Constellation took a different turn in 1991, however, with 
the publication of Fouled Anchors: The Constellation Question Answered , 
a report by Dana Wegner, the chief of ship models at the U.S. Navy's 
David W. Taylor Research Center in Carderock, Maryland. Rumors 
had circulated for half a century that the Constellation was not what 
its promoters claimed it to be, and Wegner's report confirmed them. 
Investigators from the Navy discovered that the supposed Revolu- 
tionary War-era frigate in Baltimore Harbor was actually a Civil War 
era sloop that had been built in Norfolk, Virginia, in 1854. All it 
shared with the frigate built in Baltimore in the eighteenth century 
was its name. It resembled a Revolutionary War-era frigate because 
during early renovations, some of the ship's admirers had "restored" 
the Constellation to appear to be almost 60 years older than it was; for 
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example, they added a second gun deck and made other alterations. 
For most of its tenure in Baltimore, the Constellation was living a lie 
(Wegner 1991; LeDuc 1999). 

Many themes are at work in the story of the true identity of the 
Constellation. Early citizens of Baltimore, for example, seemed to 
have a stronger need to connect to the Revolutionary War than to the 
Civil War. They may have felt that "older is better," and that the ship 
would be of greatest interest if it was thought to have a Baltimore 
connection (i.e., if it had been built there). Nonetheless, their distor- 
tion of history came at the expense of the Constellation's very interest- 
ing own history. It was, for example, the last and largest all sail-pow- 
ered sloop commissioned by the U.S. Navy, and while it did not 
engage in a famous sea battle, as did its predecessor, it did work to 
interdict the slave trade during the mid-1800s. 

The most interesting themes in the Constellation story, however, 
revolve around the issue of authenticity — not the authenticity of the 
ship itself, but rather the authenticity of the documentation about the 
ship. For it was not just the appearance of the ship that was "forged," 
but also the written record concerning the ship. 

Some of the changes to the written record may not have been an 
intentional effort at deceit. Between 1854 and 1908, for example, the 
annual reports of the Navy listed the ship as having been built in 
Norfolk in 1854; however, from 1909 onward, the reports listed the 
ship as having been built in Baltimore in 1797. Was this an intention- 
al effort to deceive or an honest effort to correct what naval officers 
may have thought was a past mistake? Wegner could not determine 
the answer. 

In the 1950s, however, documents began to appear that Federal 
Bureau of Investigation (FBI) investigators later determined were 
forged. One document, allegedly written in 1918, was found to have 
been written with a typewriter made after 1946. Some of the forged 
documents in the possession of researchers bore forged stamps indi- 
cating that they were copies of records found in the National Ar- 
chives. Other forged documents were inserted into historical files at 
the National Archives and at the Franklin Roosevelt Presidential Li- 
brary, where they were subsequently "found" by researchers. 

The need to alter the archival written record to conform to a par- 
ticular historical interpretation speaks to the power of archives to au- 
thenticate. At rest in Baltimore Harbor was a physical artifact, a 
wooden ship, measuring over 180 feet long and weighing several 
hundred tons. The existence of the artifact per se, however, was not 
enough to establish its authenticity. To confirm beyond doubt the na- 
ture and history of the Constellation , both supporters and critics of 
the "Constellation as frigate" theory turned to a few sheets of paper 
housed in a few archives. 

What characteristics of traditional analog archives give them the 
power to authenticate? And how can this power be maintained in the 
digital world, both for archives and for other cultural heritage repos- 
itories in general? 
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The Nature of Archives 

To understand why users turn to and trust information found in ana- 
log archives, it is necessary to understand the nature of archives. In 
the vernacular, the word archives has come to mean anything that is 
old or established, be it collections of old movies (such as the Pacific 
Film Archive), a journal that publishes what the editors hope will be 
papers of enduring value (for example Virchows Archiv, the official 
journal of the European Society of Pathology), or even rock-and-roll 
oldies on cable television (in the VH1 Archives) (Maher 1997). Even 
information professionals have not been loath to extend the defini- 
tion of archives beyond that found in the American Library Associa- 
tion (ALA) Glossary or other official lexicons when they speak of 
" digital archiving," a generic term for the preservation of electronic 
information. 

While archivists often inherit responsibility for old things, a col- 
lection of historic documents or artifacts, in and of itself, does not 
make an archives. A true archives is a contextually based organic 
body of evidence, not a collection of miscellaneous information. A 
manual written by Dutch archivists almost a century ago codified 
existing German and French archival theory and developed a mod- 
ern basis for archives. According to these authors, archives are "the 
whole of the written documents, drawings and printed matter, offi- 
cially received or produced by an administrative body or one of its 
officials ..." (Muller, Feith, and Fruin 1968). This definition has been 
adopted in one form or another by most of Western society. 

Found within this definition are the essential elements that de- 
fine an archives and are the source of much of its power to authenti- 
cate. First, archives consist of documents. For the Dutch, these docu- 
ments had to be written or printed; modern archivists extended the 
definition to include multimedia records, including sound record- 
ings and motion pictures. More recently still, archivists (and the 
courts) have added electronic records to the definition of documents. 
A recent court case even argued (unsuccessfully) that "cookies," the 
small transactional files created by many Web browsers when surf- 
ing the Internet, were government records when found on a comput- 
er used by a government official; others have argued that voice-mail 
messages are documents (Welch 1998). In short, archives consist of 
documents, regardless of their form.l 

The documents constituting a formal archives are further distin- 
guished by the fact that they have to have been officially produced or 
received by an administrative body. Such documents become 
records. According to the most recent glossary of archival terms, 
published by the Society of American Archivists, a record is a "docu- 
ment created or received and maintained by an agency, organization, 
or individual in pursuance of legal obligations or in the transaction 
of business" (Bellardo and Bellardo 1992). When someone requests a 
Social Security card, when a business reports its revenues for tax 



i Of course, the question of what constitutes a "document" can be problematic 
(Buckland 1997). 
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purposes, or when President Clinton issues a proclamation, docu- 
ments are created. These documents are records because the agencies 
or officials involved in each transaction are fulfilling legal obligations 
as they conduct their business. Similarly, when a faculty committee 
approves tenure for an assistant professor, or when an organization 
issues an invitation to a meeting, a record is created. 

Note that under this definition, the archivist is not concerned 
about the value, accuracy, or utility of the content of the record. A 
document may contain lies, errors, falsehoods, or oversights — but 
still be evidence of action by an agency. Nor does a record have to be 
particularly interesting or important, or even something that anyone 
would ever want to consult again. Pure archival interest in records 
depends not on their informational content, but on the evidence they 
provide of government or business activity. As the Australian archi- 
vist Glenda Acland has noted, the " pivot of archival science is evi- 
dence, not information" (Acland 1992). 

For a time, the essence of records as evidence slipped from center 
of the archival vision. Ironically, the challenges inherent in dealing 
with the most modem of records — electronic records — forced cre- 
ative archivists to reinvestigate basic archival principles. Perhaps the 
most notable of these individuals is David Bearman, author of many 
publications on electronic records. His collection of essays on Elec- 
tronic Evidence: Strategies for Managing Records in Contemporary Orga- 
nizations is particularly noteworthy (Bearman 1994). Similar analysis 
has been conducted by the Australians Sue McKemmish, Frank Up- 
ward (McKemmish and Upward 1993), and Glenda Acland, and by 
the archival educators Luciana Duranti in Canada (Duranti 1998) 
and Margaret Hedstrom in the United States (Hedstrom 1995). All 
these authors have concluded to some extent that one can deal effec- 
tively with electronic records only if one returns to the first princi- 
ples of archival theory, including the importance of records as evi- 
dence. 

Records as evidence provide internal accountability for an agen- 
cy and make it possible for the agency to determine what it has done 
in the past. More important, archives — when they contain records 
that can serve as evidence — can force leaders and institutions to be 
accountable for their actions. Government archives that contain evi- 
dence of the actions of the government can ensure that the rights of 
individual citizens are protected.2 They can also provide evidence of 
when, where, and why the Navy might build and name a new ship. 

Records preserved as evidence may also be interesting because 
of their informational content. For example, census records retained 
in an archives because of the evidence they provide about the activi- 
ty of the Census Bureau, may be of great interest to genealogists. To 



2 These two themes — the ability of archives to hold public officials accountable 
and to protect the rights of individual citizens — form the basis of the new 
mission statement of the National Archives and Records Administration, i.e., "to 
ensure ready access to essential evidence [and note the emphasis on evidence] . . . 
that documents the rights of American citizens, [and] the actions of federal 
officials " 
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many archivists, however, the fact that the Census Bureau creates 
census returns in the course of conducting its legally mandated busi- 
ness — not the information contained in the record — is of paramount 
importance.3 

At the heart of an archives, therefore, are records that are created 
by an agency or organization in the course of its business and that 
serve as evidence of the actions of that agency or organization. The 
agency or organization maintains those records for its business pur- 
poses. At the point when the records are no longer of immediate val- 
ue to the organization, it may elect to transfer its records to an ar- 
chives. The archives become responsible for maintaining the 
evidentiary nature of the materials after the records have left the con- 
trol of the agency that created them. 

One way in which archivists working with analog records have 
sought to ensure the enduring value of archives as evidence is 
through the maintenance of an unbroken provenance for the records. 
Archivists need to be able to assert, often in court, that the records in 
their custody were actually created by the agency specified. Further- 
more, the archivist must be able to assert that the records have been 
in the custody only of the agency or the archives. In an analog envi- 
ronment, the legal and physical transfer of the documents from the 
agency to the archives ensures an unbroken chain of custody. 

Archives truly exist only when there is an unbroken chain of cus- 
tody from the creating agency to the archives. For a government ar- 
chives, the transfer of custody is best accomplished as a matter of 
law. As Margaret Cross Norton, a pioneer theorist of American ar- 
chives, noted: 

We must disabuse ourselves of the concept that the acquisition by 
the state historical society of a few historical records . . . automat- 
ically transforms the curator of manuscripts into an archivist . . . 

An archives department is the government agency charged with 
the duty of planning and supervising the preservation of all 
those records of the business transactions of its government re- 
quired by law or other legal implication to be preserved indefi- 
nitely (Mitchell 1975). 

In a nongovernmental agency, policy can take the place of law if 
the policy identifies what records of business transactions need to be 
preserved indefinitely. Either law or policy, however, should govern 
the transfer of records to an archives. 

Why is the authorized transfer of a complete set of records to an 
archives with an unbroken chain of custody important? First, it helps 
maintain the evidentiary value of the records. An archivist can be 
called upon to testify in court about the nature of the records in his 
or her custody. That archivist would not be expected to testify as to 



3 While most archivists would agree with the definition of a record as presented 
in this paper, there are strong differences about what criteria should be used in 
the appraisal of records for retention or possible destruction. Some archivists 
argue that only the evidentiary value of the records should be taken into account, 
others argue that sociocultural requirements, including the need to establish 
memory, should be considered (Cook 1997; Cox 1994; Cox 1996). 
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the accuracy of the contents of the records. However, he or she 
should be able to assert that on the day when the records left the cus- 
tody of the originating agency or organization, a particular docu- 
ment was included as part of the records. 

Equally important as unbroken custody in establishing the integ- 
rity of records is the completeness of the documents. Only records 
that are complete can ensure accountability and protect personal 
rights. As soon as records become incomplete, their authority is 
called into question. For example, when information is missing in a 
record, we do not know if it is because the information was never 
created or because it has been discarded. Individual records must be 
complete; they must contain all the information they had when they 
were created. They must also maintain their original structure and 
context. 

In addition to each individual record being complete, it is also 
necessary that the record series in which the record is created be 
complete. Because records gain meaning from their context, it is im- 
portant to know the nature of other records. Take the example of a 
case file. A case file is a record relating to one person as he or she in- 
teracts with a government agency. It might be an application for food 
stamps, an assessment of eligibility for veterans' benefits, or a re- 
quest for a reproduction of a photograph in an archives. By itself, a 
case file can tell the user a great deal, but it does not reveal whether 
the individual in question was treated differently from other people 
in the same situation. To understand a single record in context, one 
needs the whole series. There may be references from the case file to 
other records in the same series. Whenever possible, therefore, archi- 
vists seek to preserve entire series. 

This does not mean that archivists never throw anything away. 
The normal archival principle is to save only 2 to 4 percent of an or- 
ganization's records. What archivists try to avoid, however, is assess- 
ing individual records or parts of records. One either keeps the entire 
record or discards the entire record. Similarly, the normal presump- 
tion is that one either keeps or discards an entire series of similar 
records (though there may be times when the bulk of the records 
makes this impossible). 

Hilary Jenkinson, a leading archival theoretician, neatly summed 
up the importance of both the legal basis for the transfer of records to 
an archives and the need for completeness within the record series 
and the individual records. He noted the importance of authenticity 
to archives and defined it as the principle that archives are "pre- 
served in official custody . . . and free from suspicion of having been 
tampered with" (Jenkinson 1965). According to Jenkinson, the archi- 
vist's primary task is "to hand on the documents as nearly as possi- 
ble in the state in which he received them, without adding or taking 
away, physically or morally, anything: to preserve unviolated, with- 
out the possibility of suspicion, every element in them, every quality 
they possessed when they came to him" (Jenkinson 1984). 

Archivists have a responsibility to ensure the integrity of the 
documents even after they are legally transferred to a repository. In 
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an analog environment, this is done by a number of mechanisms. Us- 
ers of archives, for example, normally must work under the supervi- 
sion of an archival staff member. The users are instructed to maintain 
the order of records as they are found and are cautioned against add- 
ing material to or removing it from the file. In some cases, especially 
when documents are known to be of great economic value, an archi- 
val staff member may count the documents delivered to and then 
returned by a researcher. (Normally, however, the volume of material 
in an archives works against any sort of item control.) 

The example of the Constellation illustrates both the promise and 
the dangers associated with the evidentiary power of traditional ar- 
chives. Some of the forged documents that seemingly proved that the 
ship in the Baltimore harbor had been built in 1797 were found 
among the records of the U.S. Navy located in the National Archives 
and Records Administration. Transfer of the records presumably 
took place under the legal authority of the Federal Records Act, and 
an unbroken chain of custody had been established. Users of the 
records, therefore, could assume that any documents found in the 
record series had been created and maintained by the Navy until 
they were transferred to the National Archives. The National Ar- 
chives then maintained the records as they were received from the 
Navy. The powerful presumption must be that documents found in 
the Navy files in the Archives are an accurate reflection of the Navy's 
files at the time of the transfer. Regardless of the content of the 
records, the organizational context alone would be enough to argue 
for their authenticity. 

We now know that in the case of the Constellation, it was wrong 
to presume that all of the documents in the Navy files, as they were 
found in archives, were authentic. Archivists had sought to preserve 
the records in the context of the office that had created them and they 
had accessioned a complete series into the archives. Normally, this 
would be enough to ensure the authenticity of the records. In this 
case, however, it was also necessary to turn away from the context of 
creation of the record and to examine the individual record itself. 

When Wegner, assisted by forensic document examiners at the 
FBI, examined the problematic documents, he found a number of el- 
ements within the documents that led him to question their authen- 
ticity. Since most of the documents were copies, it was not possible to 
test inks and papers. On the basis of the typeface on some of the doc- 
uments, however, the FBI could determine that the documents had 
been typed on typewriters that did not come into existence until 30 
years after the documents had supposedly been created. Other docu- 
ments were undated and unsigned, raising questions about their au- 
thenticity. In yet another instance, the investigators noticed 14 spelling 
and typographical errors in a simple document. The investigators 
knew that the office from which this document supposedly originat- 
ed had strict requirements for accuracy; the suspect document could 
not have originated in an office that enforced those requirements. 

Without realizing it, the investigators had used one of the oldest 
archival sciences to test the authenticity of the documents: the sci- 
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ence of diplomatics. Diplomatics is a body of concepts and methods, 
originally developed in the seventeenth and eighteenth centuries, 

"for the purpose of proving the reliability and authenticity of docu- 
ments." Over time it has evolved into "a very sophisticated system 
of ideas about the nature of records, their genesis and composition, 
their relationships with the actions and persons connected to them, 
and with their organizational, social, and legal context" (Duranti and 
Eastwood 1995, quoted in Duranti and MacNeil 1996). Perhaps be- 
cause diplomatics emerged from the need to understand and authen- 
ticate medieval charters, patents, and other legal documents, Ameri- 
can archivists knew little about the field until quite recently. In 
addition, the primary problem facing American archivists for most of 
this century has not been to understand individual documents but 
rather to deal with the flood of documents on paper and in other for- 
mats generated by a bureaucratic, paper-intensive society. 

Fortunately, in 1989 an Italian archivist teaching in Canada intro- 
duced North American archivists to the primary concepts of diplo- 
matics through a series of six articles published in the Canadian jour- 
nal Archivaria (Duranti 1998). In these articles and in her later work 
on reliability and integrity, Duranti expands on the interrelationship 
between the form, structure, and authorship of documents. The form 
of a record and the procedure for its creation, she asserts, determine 
the reliability of the record. A record is more likely to be reliable 
when its form is complete than when it is incomplete. While docu- 
ments can require many elements, the two most commonly required 
elements of form are the date and an element, usually a signature, 
that assigns responsibility to a person for the content of the record 
(Duranti 1995). 

Diplomatics also provides a mechanism for evaluating the au- 
thenticity of copies. Why is an original more reliable as evidence 
than a copy? It is because the original has the maximum degree of 
completeness and a higher degree of control in the procedure of cre- 
ation of the document. Creating a copy always introduces the possi- 
bility for variation or change from the original. 

On the other hand, there are times when a copy may be more re- 
liable than an original. For example, a contract for the sale of the 
house that is copied into the deed books of a village government 
may be more reliable than the original, because a third, impartial, 
authority can attest to the agreement of the parties represented in the 
contract. Archives have a long tradition of producing authentic cop- 
ies, i.e., copies that have not been subject to manipulation, substitu- 
tion, or falsification after the completion of the process that created 
the original record. Such copies often entail a change in format (for 
example, from paper to microfilm) and require that procedures be in 
place to ensure the authenticity of the resultant copies. If the latter 
condition is met, archivists willingly discard the originals. 

An archivist could use the principles of diplomatics to judge the 
reliability and the authenticity of the individual documents in the 
Constellation case. For example, questioned documents that lacked a 
date or a signature would fail the fundamental test for reliability. The 
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document filled with misspellings and typographical errors would 
also fail. The form of a document that does not follow the documen- 
tary conventions of the creating office is suspect; the document itself 
may be unreliable. 

In summary traditional archival theory has developed two ap- 
proaches for ensuring the authenticity of the document. The first ap- 
proach, the basis for most American archives, seeks to understand 
and control the context in which records are created. Records that are 
generated in an agency, transferred by law or policy to an archival 
agency through an unbroken change of custody, and maintained 
complete and inviolate by that archival agency are presumed to be 
authentic. The second approach, as exemplified in the works of Du- 
ranti, focuses on the individual record: its form and the circumstanc- 
es of its creation. Together, these two approaches are used to ensure 
the authenticity of records in the analog world. 



Archival Authenticity in a Digital World 

The archival profession has established a theoretical base to justify 
the assertion of authenticity when dealing with analog records. But 
will the principles that have worked so well in the analog environ- 
ment transfer to the new digital world? Wendy Duff has noted, "As 
records migrate from a stable paper reality to an intangible electronic 
existence, their physical attributes, vital for establishing the authen- 
ticity and reliability of the evidence they contain, are threatened" 
(Duff 1996). The ease with which records in electronic form can be 
created, transferred, and modified only heightens the importance of 
maintaining their integrity. The central question facing all archivists, 
therefore, is how to ensure the authenticity of records in digital form. 
Can the traditional archival methodologies developed for analog 
records be used for digital records? Or must new methodologies and 
techniques be developed to ensure that the archival records remain 
authentic over time? 

A number of important initiatives are under way to explore how 
the integrity of records can be preserved in a digital environment. 
None of the strategies has yet become widely accepted, primarily be- 
cause they have not been tested in the field. As Philip Bantin has con- 
cluded, "In short, there are no clear-cut answers available yet, but 
there are plenty of very good ideas and emerging strategies out 
there" (Bantin 1999). Two of the more promising approaches can be 
summarized here. 



The University of Pittsburgh Functional Require- 
ments for Evidence in Recordkeeping Project 

The University of Pittsburgh conducted one of the first and most ex- 
tensive research projects that sought to identify the functional re- 
quirements for the preservation of electronic evidence. Its project, the 
"Functional Requirements for Evidence in Recordkeeping," consist- 
ed of three main components. First, the project identified the func- 
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tional requirements for recordkeeping in a variety of communities. 
The project recognized that groups other than archivists (e.g., the le- 
gal, medical, and business communities) also had need for authentic, 
reliable records. Laws, standards, customs, and the best practices of 
each community contain the justifications for record keeping. To en- 
sure that electronic records meet the needs of those communities (i.e., 
that they become what the project identified as "business acceptable 
communications"), one must identify the requirements for record- 
keeping in each community and then establish metadata that meet 
those requirements. The project did this by establishing the record- 
keeping requirements and practices of organizations — the "literary 
warrant" (Duff 1996; Bearman 1996). 

Using the requirements necessary for literary warrant, the 
project then produced a general specification of the attributes of evi- 
dentiality. The specification consists of 13 properties that are catego- 
rized into three groups. The first group requires a conscientious orga- 
nization that complies with legal and administrative requirements for 
recordkeeping. The second group specifies the requirements for ac- 
countable recordkeeping systems, including policies, assigned responsi- 
bility, and formal methodologies for their management and accurate 
and complete documentation. The Pittsburgh system presupposes 
that accountable recordkeeping systems are used at all times in the 
normal course of business. The third group defines the requirements 
that relate to the record itself ’ specifically how the record is created or 
captured, how it is maintained, and what is necessary for the record 
to be used. 

In addition to developing the general specification of the require- 
ments for evidentiality, the Pittsburgh project developed a set of pro- 
duction rules to express formally each functional requirement. David 
Bearman, a consultant on the project, has turned the production rules 
and general analysis into a set of metadata requirements. The goal is 
to be able to create records that are encapsulated metadata objects: 
content in an envelope of metadata that ensures the authenticity, in- 
tegrity, reliability, and usability of the content. 

Implicit in the Pittsburgh approach is the assumption that "re- 
cordness" and "evidentiality" (the elements that determine the trust- 
worthiness of records in business and legal settings) can be main- 
tained in an electronic system only if the requisite functionality is 
built into the record system from the start. Several efforts have been 
made to implement the Pittsburgh model, most notably in projects 
under way at Indiana University, a Swedish pharmaceutical compa- 
ny, and the City of Philadelphia, but there is no consensus whether 
the Pittsburgh project has identified the true functional requirements 
for authenticity. Some worry that the Pittsburgh model may be too 
complex, and hence too costly, to implement. Furthermore, it presup- 
poses radical changes in how documents are generated. For example, 
if one wishes to write a report, one currently opens a word process- 
ing package and begins writing. The Pittsburgh system seems to pro- 
pose that in the future one would open instead a report- writing 
module. The module would "know" who you are, what your author- 
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ity for writing the report is, and in what format you are writing the 
report. The software would automatically encapsulate each draft of 
the report with this management information. While highly desirable 
or even mandatory, to ensure the authenticity of the electronic file, 
such an approach does not reflect how people currently use soft- 
ware. 



University of British Columbia Preservation of the 
Integrity of Electronic Records and InterPARES Projects 

Two projects at the University of British Columbia (UBC) are investi- 
gating the integrity of digital information over time. The first project, 
"Preservation of the Integrity of Electronic Records," sought to iden- 
tify the best methods for preserving the reliability and authenticity of 
electronic records over time. The UBC analysis determined that ge- 
neric information systems designed to collect, process, store, and dis- 
seminate information lack some of the functionality needed to pro- 
duce, maintain, and preserve reliable electronic records. For example, 
most current systems do not adequately relate the content of records 
to business transactions. They also lack sufficient metadata to moni- 
tor the creation and maintenance of records in a way that ensures 
they will be both reliable and understandable when retrieved in the 
future. The project concluded that reliability and authenticity of elec- 
tronic records are best ensured when procedural rules for record- 
keeping are embedded into the overall records system. This finding 
is similar to that of the Pittsburgh project, which expressed an inter- 
est in building into systems the automatic capture of the metadata it 
has determined are needed to ensure the recordness of the data (Du- 
ranti and MacNeil 1996; Hedstrom 1996). 

In other ways, however, the UBC project was fundamentally dif- 
ferent from the Pittsburgh project (Duranti and MacNeil 1996; Bantin 
1999; Marsden 1997). For example, the analysis of the requirements 
for recordkeeping in the two projects differed greatly. The Pittsburgh 
project based its analysis on literary warrant, whereas the UBC 
project's analysis was based on diplomatics and archival theory. 

In part because of the difference in starting points, the two 
projects reached fundamentally different conclusions in some areas. 
One of the most striking differences relates to the role of the archives 
in ensuring authenticity. The Pittsburgh project did not assume that 
an archives is needed to ensure the preservation and authentication 
of records. In the Pittsburgh system, it is the metadata, not the custo- 
dial agency, that determine the authenticity of records. Records can, 
and in most cases should, remain in the custody of the agency that 
created them. As one of the Pittsburgh project members has argued, 
"Archivists cannot afford — politically, professionally, economically, 
or culturally — to acquire records except as a last resort . . . Indeed, 
the evidence indicates that acquisition of records and the mainte- 
nance of the archives as a repository gets in the way of achieving ar- 
chival objectives and that this dysfunction will increase dramatically 
with the spread of electronic communications" (Bearman 1991). The 
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UBC project, in contrast, placed archives at the heart of the authenti- 
cation system for electronic records, in a fashion similar to the role 
played by archives in protecting and authenticating paper records. 
This project concluded that "the routine transfer of records to a neu- 
tral third party, that is, to a competent archival body, invested with 
the exclusive authority and capacity for the indefinite preservation of 
inactive records, is an essential requirement for ensuring their au- 
thenticity over time" (Duranti and MacNeil 1996). 

The "Preservation of the Integrity of Electronic Records" project 
at UBC sought to establish a theoretical framework based in tradi- 
tional archival principles for the authentication of digital informa- 
tion. A follow-on project is now seeking to put some of these princi- 
ples into action. The InterPARES (for "International Research on 
Permanent Authentic Records in Electronic Systems") project is an 
international collaboration spearheaded by UBC. Its goal is to use the 
tools of archival science and diplomatics to develop the theoretical 
and methodological knowledge essential to the permanent preserva- 
tion of inactive electronically generated records. It will then formu- 
late model strategies, policies, and standards capable of ensuring the 
preservation of those records. The InterPARES project has generated 
great interest in the archival community, in part because it is based 
on familiar principles and practices. The community eagerly awaits 
reports of its findings. 



Conclusion 

It is not possible at this early stage to say whether Pittsburgh or UBC 
has the better approach for ensuring the authenticity of records. Both 
approaches need to be tested in the field (Bantin 1999). As Margaret 
Hedstrom has noted, "What we lack is an evaluation of the useful- 
ness of these findings from the perspective of organizations that are 
responsible in some way for preserving and providing access to elec- 
tronic records. We need assessments from the administrators of ar- 
chival and records management programs about the feasibility of 
putting the proposed policies, and models into practice. We need re- 
actions from people outside the archival community especially 
where related research and projects are being conducted" (Hedstrom 
1996). 

In the interim, however, it is easy to speculate that some combi- 
nation of the Pittsburgh and UBC approaches will come to dominate. 
The Pittsburgh project's basis in the actual documentary require- 
ments of different communities is very appealing, and the project's 
desire to include administrative metadata from the very moment of 
creation is highly desirable. 

On the other hand, it is unlikely that all information of interest to 
future users of records systems will be found in records creation 
management systems fully compliant with the Pittsburgh metadata. 
Scholars will be willing to access, use, and evaluate the information 
found in the electronic files, regardless of whether the actual data 
convey the true quality of "recordness." An archival purist might in- 
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sist that if information is not stored in a record keeping system, then 
the information cannot be a record and therefore should not be part 
of the archival record. In reality, however, our repositories are filled 
with interesting information that may not meet the formal definition 
of "record" or may not have been created with a record keeping sys- 
tem in mind. 

A good example of how material that is not formally a record can 
be valuable to the researcher is the famed PROFS case (Bearman 
1993). PROFS refers to a proprietary IBM communication system 
used in the White House under Presidents Ford and Reagan. Because 
they were system back-up tapes, the PROFS tapes lacked even the 
rudiments of record keeping functionality. Nevertheless, a consor- 
tium of historical groups sued for the release of the tapes. In the ab- 
sence of controlled records, the information on the back-up tapes 
was the best the researchers could find. For researchers, the value of 
the tapes was great because they were still held by the agency and 
were surprisingly complete. However, even if only selections of the 
e-mail messages had survived and were located only in nongovern- 
mental repositories, researchers would still try to use them, even 
though their authenticity was more questionable. 

In short, social mechanisms of control promise to be the funda- 
mental basis for the establishment of digital authenticity. It would be 
desirable if all digital information consisted of true records created in 
a system that encapsulates with the record the information needed to 
maintain the evidential value of the records. For most digital infor- 
mation, however, the fact that it is in an archives, an unbiased third 
party, will have to suffice. As with the paper records used in the Con- 
stellation example, the fact that digital information is found within a 
trusted repository may become the base upon which all further as- 
sessments of authenticity build. 

Even if the physical presence of digital data in a trusted reposito- 
ry is the basis for future assessments of authenticity, archivists will 
still need to associate with those digital documents metadata that 
researchers can use to understand and assess digital information. We 
need self-conscious documentation by the creators and preservers of 
digital representations that details the methods employed in making 
and maintaining the representations. We also need to know what re- 
searchers need to know about the transformation from analog to dig- 
ital format, as well as about any transformations that may occur as 
digital data are preserved. To determine the latter, we need to under- 
stand the "digital literacy" that future researchers will need "to as- 
sess digital information, identify known artifacts introduced by par- 
ticular processes, and correctly identify as yet unknown sources of 
distortion" (Bearman and Trant 1998). Only by understanding the 
interactions between researcher and document and records and re- 
positories will we be able to convey into the future the trust mecha- 
nisms of the paper world. 
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Where’s Waldo? 

Reflections on Copies and 
Authenticity in a Digital Environment 

by David M. Levy 



Introduction 

Y ou have probably seen the "Where's Waldo?" children's 

books. Each double-page spread contains drawings of hun- 
dreds of cartoon figures. Your job is to find Waldo, a charac- 
ter who is always dressed in a red-and-white striped woolen cap and 
shirt and is wearing glasses. Often there are characters who look a lot 
like him, but if you look closely you can see that some detail or other 
is wrong (e.g., it is a woman, the cap is solid red). In other words, 
only one of the figures on the page is the real Waldo; the rest are im- 
postors, look-alikes, or close matches. "Pay attention," these draw- 
ings seem to say, "Appearances can be deceiving." 

Waldo presents the problem of authenticity in graphical form. 
Although a number of the cartoon figures seem to be Waldo, only one 
is the authentic Waldo. Being authentic in this case means being who 
or what you seem or claim to be. In Waldo's case, there can only be 
one right answer, since we are talking about a unique individual. But 
in other cases, there may be more than one right answer. This hap- 
pens when we are concerned with, say, group membership (being a 
medical doctor) or with types (being a 1956 Chevy). It is only be- 
cause we live in a world of multiplicity — where several people or 
things may appear to be the same — that duplicity is possible. Judg- 
ments of authenticity, as I understand it, allow us to navigate through 
a world by distinguishing genuine multiplicity from duplicity. 

In the realm of written forms — in the world of paper and other 
tangible media — we have, over the centuries, developed elaborate 
procedures for identifying authentic documents and for ferreting out 
impostors. In the digital realm, we have barely begun to do this, and 
there are many technical and social challenges to be met. One chal- 
lenge comes from the fact that the digital realm produces copies on 
an unprecedented scale. It is a realm in which, as far as I can tell, 
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there are no originals (only copies — lots and lots of them) and no en- 
during objects (at least not yet). This makes assessing authenticity a 
challenge. 



What Are Documents? 

I use the word document where others might use text, record, informa- 
tion-bearing artifact, or written form. "Document" is a cover term for a 
large group of artifacts, including textual materials, whether hand- 
written or mechanically realized; graphics and photographs; and au- 
diovisual presentations. But by what criterion do all these things fit 
into a single, coherent category? 

I have come to understand documents by analogy with human 
beings. Documents are surrogates for people. They are bits of the 
material world (stone, clay, wood pulp, and now silicon) that we cre- 
ate to speak for us and take on jobs for us. A receipt bears witness to 
and thereby validates a financial transaction; a restaurant menu 
speaks for the establishment, the restaurant; a novel tells a story; a 
political flyer speaks for a candidate or political organization; and so 
on.l By saying that documents "speak," I do not mean to limit them 
to textual or verbal materials. Pictures, drawings, diagrams, moving 
images, and other conventional forms of communication also speak 
in the metaphorical way in which I am using the term: they commu- 
nicate, they tell us things about the world. And when I say docu- 
ments "take on jobs," I am referring to the way we tailor their form 
and content to particular tasks and contexts. Genre (whether a re- 
ceipt, a menu, a novel or a flyer) is, in effect, the clothing of conven- 
tional content to do particular tasks in the world (to witness a finan- 
cial transaction, recite the dishes available and their prices, etc.) 

(Levy 1999). 

For a document, speaking per se is not enough. It also must be 
able to speak reliably. We depend on documents to carry messages 
through space and time. In many cases, this reliability is achieved 
through fixity: letterforms inked on paper can survive for long peri- 
ods of time. But with newer media, such as video, this reliability is 
achieved not by fixity but by repeatability. The moving images on a 
video screen are by their very nature transient. I will never be able to 
see those very images again. But I can play the tape repeatedly, each 
time seeing a performance that, for all practical purposes, is "the 
same as" the one I saw the first time. 

If documents are meant to be reliable surrogates for human be- 
ings, then it makes perfect sense that we would be critically con- 
cerned with their authenticity. Steven Shapin (1994), a sociologist, 
argues that human social order — that human life itself — is funda- 
mentally based on trust, i.e., on our ability to rely on one another. 



1 There are great complexities and ambiguities regarding who is speaking in or 
through a document. In literature, for example, distinctions have been made 
between the narrator, the implied author, the "real" author, etc. Such 
complexities and ambiguities also exist, however, when a human being is 
speaking. 
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"How could coordinated activity of any kind be possible if people 
could not rely upon others' undertakings? No goods would be hand- 
ed over without payment, and no payment without goods in hand. 
There would be no point in keeping engagements, nor any reason to 
make engagements with people who could not be expected to honor 
their commitments," he writes. Much as we rely on one another, we 
also have come to rely on documents in the making and maintaining 
of a shared, stable, social order. So it is no accident that words such 
as trust, reliability, and truthfulness, which are fundamentally social, 
would apply to documents as much as to people. It is likewise no 
accident that documents, as surrogates for us, would be accountable 
in the same terms. 

What Is a Copy? 

I worked for Xerox for a number of years, so it should hardly be sur- 
prising if some of my thinking and my examples come from the 
world of photocopying. In that world, "to make a copy" means to 
put one or more pieces of paper on the photocopier platen or in the 
RDH (recirculating document handler) and push the Big Green But- 
ton. What comes out at the other end of the machine is a "copy." In 
this context, a copy is something that is the result of a process of copy- 
ing. It says nothing about whether the result is a good copy or a bad 
copy, or whether or not it is useful. 

But there is a second notion of copy, which has more to do with 
the product than the process. To be a copy in this sense is to stand in a 
certain relation to an original, that is, to its origin. To be a copy in this 
sense is to be faithful to the original. The definition of "faithful," 
however, depends on the circumstances in which the copy is being 
made and on the uses to which it will be put. The context of use, in 
other words, determines which properties of the original must be 
preserved in the copy. Does it matter that I have just made a photo- 
copy of a signed will? It depends on what I intend to do with it. If it 
is for informational purposes (to show you what my will says), then 
it is an adequate copy; for some legal purposes, however, it won't do. 

The point is, a document can be identical only with itself, if "iden- 
tical" is taken to mean "the same in every respect." When we say 
that something is "the same," we generally mean one of two things. 
We either mean that it is "the very same" thing (as in "This is the 
same car I drove yesterday") or that it is "of the same type" as some- 
thing else ("I read that same book last year"). It is this second notion 
of sameness — sameness of type, sameness in virtue of sharing certain 
properties — that is at issue in copying (Levy 1992). 

Even an extremely high-fidelity copy will be different from the 
original in innumerable ways, because to copy is to transform. The 
copy will be on a different piece of paper that has its own unique 
properties. The process of photocopying will make letterforms thick- 
er or thinner than those on the original, and will make images lighter 
or darker; it will add noise or remove it; it will change tones, shapes, 
aspect ratios, and so on. Differences will always be introduced in 
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copying; the trick is to regulate the process sufficiently so that the 
resulting differences are of little or no consequence and that the 
properties of greatest consequence are shared. Determinations of 
which properties matter are made in the context of purpose and use. 



Copying Without an Original 

I have presented a simple and straightforward notion of copying. 
Although I have used the photocopier to illustrate how it works, this 
notion is not dependent on any particular technology. Making a copy 
by hand embodies the same idea. Moreover, although I have talked 
about making a single copy, one can obviously make multiple copies 
of an original — an indefinite number, in fact. It is common for some- 
one to create a "master" document and to produce any number of 
copies from it. What is crucial in this scheme is that there is an origi- 
nal from which the copies are made. 

But there is another scheme — one that does not require an origi- 
nal. It is a manufacturing technique, a means of producing a large 
number of artifacts from a single source. If you want to make coins, 
for example, you can create a mold and pour molten metal into it to 
cast the coins. This is also the way the printing press works. You cre- 
ate a set of printing plates that are used to produce inked pieces of 
paper. 

The reason I say there is no "original" in this technique is that 
the source 2 from which the copies are made (the mold or the printing 
plate) is a very different kind of thing than the copies.3 You cannot 
spend the mold (although you may be able to mint more coins); you 
would not normally choose to read the text on the printing plate. 

This means that the word copy is being used in a somewhat different 
sense. It perhaps harks back to the root meaning of the word (copi- 
ous, plentiful). But there is another sense in which the artifacts pro- 
duced in this way are copies: They are copies of one another. Indeed, 
to a large extent, the purpose of this technique is to manufacture a 
set of "identical" artifacts — artifacts that are all "the same," that is, of 
the same type. These artifacts are identical in the sense that they are 
interchangeable with one another for certain purposes. 

The examples I have given so far involve the production of en- 
during physical artifacts, or things. But this method of copying from 
a source also works for producing activities or events, which by their 
very nature are transient. Consider the case of a play, where a script 
(the source) serves as the basis for a number of performances (the 
copies) or an audio or videotape (the source), which leads to the real- 
ization of sounds or visual images, or both. 

2 1 will use the word source to designate the thing from which copies are made in 
this method, and the word original when I mean something that is of the same 
kind as the copies. 

3 1 do not mean to suggest that there can never be an original that is used to guide 
the making of the source. I may print an edition of Leaves of Grass, taking the text 
from the 1891 edition. In this case, some actual printed copy of the 1891 edition is 
my original. Nevertheless, the production of my new edition is mediated by the 
printing plates I have created, and these plates are not an original. 
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In none of these cases, however, is the source ever enough. Man- 
ufacturing the intended artifacts also requires a complex of skills, 
know-how, and, often, technical equipment. The mold for coins is 
useless without the right metals and the skill to do casting; a printing 
plate is useless without a printing press and knowledge of how to 
use it; the script needs a cast of actors; and the videotape needs a 
video player. In each case, the quality of the product or the perfor- 
mance depends on a skillful and properly executed process of pro- 
duction. The source, in other words, does not and cannot fully speci- 
fy the properties of the things it is used to make. There is a division 
of responsibility between the source and the environment in which it 
operates. 

It is worth comparing print with analog audio or video recording 
before talking about the digital case. In the case of printing, the 
source is used to produce a definite number of copies, an edition. 
Each copy in an edition is a stable physical object whose existence is 
independent of the source. But in the case of the recording, when the 
tape is defined as the source, there is no notion of a definite number 
of copies (e.g., replayed performances); rather, once you have the 
tape and an appropriate player, you can produce a (relatively) unlim- 
ited number of copies, or performances. Moreover, unlike the prod- 
ucts of print, the copies are completely dependent on the source for 
their existence. Should the tape be damaged or lost, there will be no 
more performances. This gives the source a greater importance in the 
case of recordings. You have to preserve it if you want copies in the 
future. (And, of course, you have to preserve the player, which is the 
means of making copies from the source.) In the case of printing, by 
contrast, once the source has done its work, it is no longer needed. 
(Indeed, the advantage of movable type is that it can be reused, i.e., 
the elements of the source can be recycled.) 

Digital Documents 

Like printed documents and recorded audio and video performanc- 
es, digital documents are founded on a distinction between a source 
and the copies produced from it. The source is a digital representa- 
tion of some kind, a collection of bits. The copies are the sensible im- 
pressions or manifestations — text, graphics, sound, whatever — that 
appear on paper, on the screen, and in the airwaves. Getting from the 
source to the copy requires a complex combination of technical and 
social environment, including an elaborate configuration of hard- 
ware and software. 

In one sense, digital technologies are very much modeled on the 
printing press. They allow users to create what amount to digital 
printing plates from which they can "print" an arbitrary number of 
copies. The relation with traditional print is particularly strong when 
the copies produced are textual and graphical in nature, as is so 
much of the material on the Web today. But digital documents, even 
those with textual content, share significant features with analog au- 
dio and video recordings as well. With audio and video, we tend to 
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think of the source (in this case, the audio or videotape) as more per- 
manent than the copies produced from it (the performances), which 
are inherently transient. Currently, we seem to be importing this 
same hierarchy of permanence into the digital domain. We think of 
the digital source (such as a Microsoft Word file) as more permanent 
than the text and images that appear on the screen. This makes sense, 
because we know how to "save" the file. When we have done so, it 
will typically survive on a hard drive or a floppy despite power loss, 
whereas the screen image cannot. But as we adopt this way of think- 
ing, we are also coming to treat paper copies (analogous to screen 
images) as more transient than the source file. We often print out a 
paper copy to read and then toss it away, confident that we will be 
able to print out another as long as we have the file. But the truth is, 
at least for the moment, that paper has a better chance of survival 
than a digital source. 

Indeed, digital entities are generally less stable than their coun- 
terparts on paper and other tangible media, and digital production 
tends to yield much greater variability of product than analog pro- 
duction does. In the case of print, once we have the plate and a press, 
the amount of variability is limited. Even more so is this the case 
with an analog recording: once we have the tape and an appropriate 
player, the amount of variability in performances is typically fairly 
well constrained. The differences generally are limited to minor vari- 
ations in quality. For digital copies, however, there is likely to be a 
much greater range of variability. Some of the variability is intention- 
al and it is a great strength of the technology. We can easily edit digi- 
tal documents and quickly produce variants. Some variability is un- 
intended and is an unresolved problem: digital copies are extremely 
sensitive to the technical environment, to the point that features we 
would like to preserve in subsequent copies may be hard (or impos- 
sible) to maintain. Displaying the file on a different computer may 
lead to font substitutions, different line breaks, and so on. These 
same sorts of variability may even occur on the same computer if, in 
the interim, the environment has changed in some crucial way. 4 Con- 
sequently, two different viewings of the "same" source may differ in 
important ways — they may not be "the same." 

Under such circumstances of radical variability, there does not 
appear to be anything like a stable document or object. Over time, 
the digital source may move from server to server. The version that 
ends up on your local computer may have been copied from a server 
and will likely have undergone further transformation; for example, 
your local browser or editor may generate other local, and possibly 
partial, digital sources in the process of creating something you can 
actually see. What you do see at any given moment will be the prod- 
uct both of the local digital source and of the complex technical envi- 
ronment (hardware and software), which is itself changing in com- 
plex and unpredictable ways. The digital source, the perceptible 



4 As sound and motion are digitally recorded, issues of uncontrolled variability 
will increasingly arise here, too. 
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copies, and the environment are all undergoing change in ways that 
no one yet knows how to control. 

Authenticity in a Digital Environment 

Assessments of authenticity in the world of paper and other stable, 
physical media rely heavily on the existence of enduring physical 
objects. If you want to determine whether the document in front of 
you is the unique individual it purports to be (someone's last will 
and testament, for example), you can try to determine its history. But 
you can do this only because it has a history, an extended existence in 
time. If you want to determine the authenticity of something that is 
one of many (a member of an edition, for example) you can compare 
it with another copy, a reference copy. And even where the thing in 
question is transient (such as the performance of a play), you still 
may be able to make use of a stable reference object (such as the 
script). In all these cases, either the object in question or a reference 
object has an enduring, physical existence that helps ground the de- 
termination of authenticity.5 

What happens in the digital case if there are no stable, enduring 
digital objects? One possibility is that we will find a way to create 
them. In one current view, objects are at least in part socially con- 
structed; they are bounded and stabilized through social interaction 
(Smith 1996). Literary works (e.g., Hamlet) are a clear example of this. 
Although we cannot really say what works are, we have nonetheless 
created a cultural mechanism (copyright and the courts) to help us 
decide where the boundaries between works lie. Here there can be 
no question of ultimate, natural answers — only social answers based 
on law and politics. In the digital domain, I see Jeff Rothenberg's 
proposal (in this collection) to stabilize digital environments through 
emulation as one attempt to create stable digital objects. (I am not 
sure it is a workable solution, but that is another matter.) 

Without the security of stable digital objects, what might we do? 
One possibility would be to maintain audit trails, indicating the se- 
ries of transformations that has brought a particular document to the 
desktop. Such a trail (akin to an object's provenance) could conceiv- 
ably lead back to the creation of the initial document or, at least, back 
to a version that we had independent reasons to trust as authentic. 
Having such an audit trail (and trusting it) would allow us to decide 
whether any of the transformations performed had violated the doc- 
ument's claimed authenticity. A second possibility would ignore the 
history of transformations and would instead specify what proper- 
ties the document in question would have to have to be authentic. 
This would be akin to using a script or a score to ascertain the au- 
thenticity of a performance. 



5 How do we know whether to trust the authenticity of reference objects? The 
whole process recurses. I agree with Clifford Lynch, who suggested in his 
presentation at this workshop that the process is ultimately grounded in our trust 
of others. The "buck stops" when we accept someone's (or some institution's) 
claim that some object in the chain of reasoning is authentic. 
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Conclusion 

I have no conclusion other than this: Understanding what we want to 
accomplish, and what we can accomplish, with regard to authenticity 
in the digital realm will take considerable effort. If nothing else, this 
workshop has convinced me of the cultural importance, as well as 
the difficulty, of the work that lies ahead. 
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Authenticity and Integrity in the Digital 
Environment: An Exploratory Analysis 
of the Central Role of Trust 

by Clifford Lynch 



Introduction 

T his paper seeks to illuminate several issues surrounding the 
ideas of authenticity, integrity, and provenance in the net- 
worked information environment. Its perspective is pragmat- 
ic and computational, rather than philosophical. Authenticity and 
integrity are in fact deep and controversial philosophical ideas that 
are linked in complex ways to our conceptual views of documents 
and artifacts and their legal, social, cultural, and historical contexts 
and roles. (See Bearman and Trant [1998] for an excellent introduc- 
tion to these issues.) 

In the digital environment, as Larry Lessig (1999) has recently 
emphasized, computer code is operationalizing and codifying ideas 
and principles that, historically, have been fuzzy or subjective, or 
that have been based on situational legal or social constructs. Au- 
thenticity and integrity are two of the key arenas where computa- 
tional technology connects with philosophy and social constructs. 
One goal of this paper is to help distinguish between what can be 
done in code and what must be left for human and social judgment 
in areas related to authenticity and integrity. 



This paper has been modestly revised based on discussion at the workshop and a 
reading of the other papers presented there. All of the papers, but particularly 
those of David Levy and Peter Hirtle, raise important issues that are relevant to 
the topic of this article. From Hirtle's paper, I had the opportunity to learn 
something of the science of diplomatics, and at the workshop, I had the 
opportunity to learn much more from Luciana Duranti. Her book. Diplomatics: 
Neiv Uses for an Old Science (1998), offers valuable and fresh insights on the topics 
discussed here. These other works provide important additional viewpoints that 
are not fully integrated into this paper and I urge the reader to explore them. My 
thanks also to the participants in the Buckland/Lyncji Friday Seminar at the 
School of Information Management and Systems at the University of California, 
Berkeley, for their comments on an earlier version of this paper. 
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Gustavus Simmons wrote a paper in the 1980s with the memora- 
ble title "Secure Communications in the Presence of Pervasive De- 
ceit." The contents of the paper are not relevant here, but the phrase 
"pervasive deceit" has stuck in my mind because I believe it perfect- 
ly captures the concerns and fears that many people are voicing 
about information on the Internet. There seems to be a sense that dig- 
ital information needs to be held to a higher standard for authentici- 
ty and integrity than has printed information. In other words, many 
people feel that in an environment characterized by pervasive deceit, 
it will be necessary to provide verifiable proof for claims related to 
authorship and integrity that would usually be taken at face value in 
the physical world. For example, although forgeries are always a 
concern in the art world, one seldom hears concerns about (appar- 
ently) mass-produced physical goods — books, journal issues, audio 
CDs — being undetected and undetectable fakes. i 

This distrust of the immaterial world of digital information has 
forced us to closely and rigorously examine definitions of authentici- 
ty and integrity — definitions that we have historically been rather 
glib about — using the requirements for verifiable proofs as a bench- 
mark. As this paper will demonstrate, authenticity and integrity, 
when held to this standard, are elusive properties. It is much easier 
to devise abstract definitions than testable ones. When we try to de- 
fine integrity and authenticity with precision and rigor, the defini- 
tions recurse into a wilderness of mirrors, of questions about trust 
and identity in the networked information world. 

While there is widespread distrust of the digital environment, 
there also seems to be considerable faith and optimism about the po- 
tential for information technology to address concerns about authen- 
ticity and integrity. Those unfamiliar with the details of cryptograph- 
ic technology assume the magical arsenal of this technology has 
solved the problems of certifying authorship and integrity. Moreover, 
there seems to be an assumption that the solutions are not deployed 
yet because of some perverse reluctance to implement the necessary 
tools and infrastructure.^ This paper will take a critical view of these 



1 Confusingly, however, we have the appearance of perfect forgeries (at least in 
terms of content; the packaging is often substandard) of digital goods in the form 
of pirate audio CDs, DVDs, and software CD-ROMs. In these cases, the purpose 
is not usually intellectual fraud so much as commercial fraud through piracy. 

One might argue that these copies have integrity (they are, after all, bitwise 
equivalent); however, their authenticity is dubious, or at least needs to be proved 
by comparison with copies that have a provenance that can be documented. 
Another case that bears consideration and helps refine our thinking is the 
bootleg or "gray-market" recording — perhaps an audio CD of a live performance 
of a well-known band, released without the authorization of the performers and 
not on their usual record label. This does not stop the recording from being 
authentic and accurate, albeit unauthorized. The performers may or may not be 
willing to vouch for the authenticity of the recording; alternatively, one may have 
to rely on the evidence of the content (i.e., nobody else sounds like that) and, 
possibly, metadata provided by a third party that potentially has its own 
provenance. 

2 It would be useful to better understand why there has not been a greater effort 
to deploy these capabilities, even though they have substantial limitations. 
Contributing factors undoubtedly include export controls and other government 
regulations on cryptography, both in the United States and elsewhere; legal and 
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cryptographic technologies. It will try to distinguish between the 
problems that cryptographic technologies can and cannot solve and 
how they relate to the development of infrastructure services. There 
seems to have been surprisingly little examination of these questions; 
this is itself surprising. 

Before attempting to define integrity or authenticity, it is worth 
trying to gain an intuitive sense of how the digital environment dif- 
fers from the physical world of information-bearing artifacts 
("meatspace," as some now call it). The archetypal situation is this: 
We have an object and a collection of assertions about it. The asser- 
tions may be internal, as in a claim of authorship or date and place of 
publication on the title page of a book, or external, represented in 
metadata that accompany the object, perhaps provided by third par- 
ties. We want to ask questions about the integrity of the object: Has 
the object been changed since its creation, and, if so, has this altered 
the fundamental essence of the object? (This can include asking these 
questions about accompanying assertions, either embedded in the 
object or embodied in accompanying metadata). Further, we want to 
ask questions about the authenticity of the object: If its integrity is 
intact, are the assertions that cluster around the object (including 
those embedded within it, if any) true or false? 

How do we begin to answer these questions in meatspace? There 
are only a few fundamental approaches. 

• We examine the provenance of the object (for example, the docu- 
mentation of the chain of custody) and the extent to which we 
trust and believe this documentation as well as the extent to 
which we trust the custodians themselves. 

• We perform a forensic and diplomatic examination of the object 
(both its content and its artifactual form) to ensure that its charac- 
teristics and content are consistent with the claims made about it 
and the record of its provenance. 

• We rely on signatures and seals that are attached to the object or 
the claims that come with it, or both, and evaluate their forensics 
and diplomatics and their consistency with claims and provenance. 

• For mass-produced and distributed (i.e., published) objects, we 
compare the object in hand with other versions (copies) of the ob- 
ject that may be available (which, in turn, means also assessing the 
integrity and provenance of these other versions or copies). 



liability issues involved in an infrastructure that addresses authentication and 
identity; and social and cultural concerns about privacy, accountability, and 
related topics. Patent issues are a particular problem. It is hard to develop 
infrastructure, widely deployed standards, and critical mass when key elements 
are tied up by patents. With the recent insane proliferation of patents on software 
methods, algorithms, business models, and the like, uncertainty about patent 
issues is also a serious barrier to deployment. All of these have been well covered 
in the literature and the press. What has been less well examined is the lack of 
clear, well-established economic models to support systems of authentication and 
integrity management. To put it bluntly, it is not clear who is willing to pay for 
the substantial development, deployment, and operation of such a system. While 
many people say they are worried about authenticity and integrity in a digital 
environment, it is not clear that they are willing to pay the increased costs to 
effectively address these concerns. 
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In the digital environment, there are few forensics or diplomat- 
ics^ other than the forensics and diplomatics of content itself. We 
cannot evaluate inks, papers, binding technology, and similar physi- 
cal characteristics.4 We can note, just as with a physical work, that an 
essay allegedly written in 1997 that makes detailed references to 
events and publications from 1999 is either remarkably prescient or 
incorrectly dated. There are limited forensics of availability, and they 
mainly provide negative information. For example, if a document 
claims to have been written in 1998 and we have copies of it that 
were deposited on various servers in 1997 (and we trust the claims of 
the servers that the material was in fact deposited in 1997), we can 
build a case that it was first distributed no later than 1997, regardless 
of the date contained in the object. Nevertheless, this does not tell us 
when the document was written. 

The fundamental concept of publication in the digital environ- 
ment — the dissemination of a large number of copies to arbitrary in- 
terested parties that are subsequently autonomously managed and 
maintained — has come under great stress from numerous factors in 
the networked information environment. These factors include, for 
example, the move from sale to licensing, limited distribution, mak- 
ing copies public for viewing without giving viewers permission to 
maintain the copies, and technical protection systems (National Re- 
search Council 2000). While the basic principle of broad distribution 
and subsequent autonomous management of copies remains valid 
and useful as a base of evidence against which to test the authentici- 
ty of documents in question, the availability of relevant and trust- 
worthy copies may be limited in the digital environment, and assess- 
ing the copies is likely to be more difficult. Moreover, the forensics 
and diplomatics of evaluating seals and signatures, and documenta- 
tion of provenance, become much more formal and computational. It 
is difficult to say whether digital seals and signatures are more or 
less compelling in the digital world than in the analog world, but 
their characters unquestionably change. Finally, provenance and 
chains of custody in the digital world begin to reflect our evaluation 
of archives and custodians as implementers and operators of "trust- 
ed systems" that enforce the integrity and provenance records of ob- 
jects entrusted to them. 

At some level, authenticity and integrity are mechanical charac- 
teristics of digital objects; they do not speak to deeper questions of 



3 It is worth carefully examining the forensic clues available when evaluating a 
digital object as an artifact. Today, many of them seem trivial, but as our history 
with digital technology grows longer, understanding them will likely become a 
specialized body of expertise. Examples include character codes, file formats, and 
formats of embedded fonts, all of which can help at least place the earliest time 
that a digital object could be created, and perhaps even provide evidence to 
argue that it was unlikely to have been created after a certain time. For an object 
that has undergone format conversions over time as part of its preservation, 
these forensic clues help only in the evaluation of the record of provenance. 

4 For digital objects created by digitizing physical artifacts, if we can identify and 
obtain access to the source physical artifact, we can apply well-established 
forensic and diplomatic analysis practices to the source object. 
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whether the contents of a digital document are accurate or truthful 
when judged objectively. An authentic document may faithfully 
transmit complete falsehoods. There is a hierarchy of assessment in 
operation: forensics, diplomatics, intellectual analyses of consistency 
and plausibility, and evaluations of truthfulness and accuracy. Our 
concern here is with the lower levels of this hierarchy (i.e., forensics 
and diplomatics as they are reconceived in the digital environment) 
but we must recognize that conclusive evaluations at the higher lev- 
els may also provide evidence that is relevant to lower-level assess- 
ment. 

Exploring Definitions and Defining Terms: 

Digital Objects, Integrity, and Authenticity 

The Nature of Digital Information Objects 

Before we can discuss integrity and authenticity, we must examine 
the objects to which we apply these characterizations. 

Most commonly, computer scientists are concerned with digital 
objects that are defined as a set of sequences of bits. One can then ask 
computationally based questions about whether one has the correct 
set of sequences of bits, such as whether the digital object in one's 
possession is the same as that which some entity published under a 
specific identifier at a specific point in time. However, this is a sim- 
plistic notion. There are additional factors to consider. 

Bits are not directly apprehended by the human sensory appara- 
tus — they are never truly artifacts. Instead, they are rendered, execut- 
ed, performed, and presented to people by hardware and software 
systems that interpret them. The question is how sophisticated these 
environmental hardware and software systems are and how integral 
they are to the understanding of the bits. In some cases, the focus is 
purely on the bits: numeric data files, or sensor outputs, for example, 
that are manipulated by computational or visualization programs. 
Documentary objects are characterized primarily by their bits (think 
of simple ASCII text), but the craft of publishing begins to make a 
sensory presentation of this collection of bits — to turn content into 
experience. Text, marked up in HTML and displayed through a Web 
browser, takes on a sensory dimension; the words that make up the 
text being rendered no longer tell the whole story. Digital objects that 
are performed — music, video, images that are rendered on screen — 
incorporate a stronger sensory component. Issues of interaction with 
the human sensory system — psychoacoustics, quality of reproduc- 
tion, visual artifacts, and the like — become more important. The bits 
may be the same across space and time, but because of differences in 
the hardware and software used by recipients, the experience of 
viewing them may vary substantially. This raises questions about 
how to define and measure authenticity and integrity. In the most 
extreme case, we have objects that are rendered experientially — vid- 
eo games, virtual reality walk-throughs, and similar interactive 
works — where the focus shifts from the bits that constitute the digital 
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object to the behavior of the rendering system, or at least to the inter- 
action between the digital object and the rendering system. 

Thus, we might think about a hierarchy of digital objects that 
could be expressed as follows: 

(Interactive) experiential works 
Sensory presentations 
Documents 
Data 

As we move up the hierarchy, from data to experiential works, 
the questions about the integrity and authenticity of the digital ob- 
jects become more complex and perhaps more subjective; they ad- 
dress experience rather than documentary content (Lynch 2000). This 
paper will focus on the lower part of the digital object hierarchy. The 
upper part is poorly understood and today is addressed only in a 
limited way; for example, through discussions about emulation as a 
preservation strategy (Rothenberg 1999, 1995). It seems conceivable 
that one could extend some of the observations and assertions dis- 
cussed later in this paper to the more experiential works by perform- 
ing computations on the output of the renderings rather than on the 
objects themselves. However, this approach is fraught with problems 
involving canonical representations of the user interface (which, in 
the most complex cases, involves interaction and not just presenta- 
tion) and agreeing on what constitutes the authentic experience of 
the work. 

In meatspace, we cheerfully extend the notion of authenticity to 
much more than objects— in fact, we explicitly apply it to the experi- 
ential sphere, speaking of an "authentic" performance of a baroque 
concerto or an "authentic" Hawaiian luau. To the extent that we can 
make the extension and expansion of the use of authenticity as a 
characteristic precise within the framework and terminology of this 
paper, these statements seem to parallel statements about integrity of 
what in the digital environment could be viewed as experiential 
works, or performance. 

Even as we struggle with definitions and tests of integrity and 
authenticity for intellectual works in the digital environment, we are 
seeing new classes of digital objects — for example, e-cash and digital 
bearer bonds — that explicitly involve and rely upon stylized and pre- 
cise manipulation of provenance, authenticity, identity and anonymi- 
ty, and integrity within a specific trust framework and infrastructure. 
While these fit somewhere between data and documents in the digi- 
tal object hierarchy, they are interesting because they derive their 
meaning and significance from their explicit interaction with frame- 
works of integrity, authenticity, provenance, and trust. 

Canonicalization and (Computational) Essence 

Often, we seek to discuss the essence of a work rather than the exact 
set of sequences of bits that may represent it in a specific context; we 
are concerned with integrity and authenticity as they apply to this 
essence, rather than to the literal bits. Discussions of essence become 
more problematic as we move up the digital object hierarchy. How- 
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ever, even at the lower levels of data and documents, we encounter a 
troublesome imprecision that is a barrier to making definitions oper- 
ational computationally when we move beyond the literal definition 
of precisely equivalent sets of sequences of bits. Those approaching 
the question from a literary or documentary perspective cast the is- 
sue in a palette of grays: there are series (not necessarily a strict hier- 
archy; at best a partial ordering) of intellectual abstractions of a doc- 
ument that capture its essence at various levels, and the key problem 
is whether this abstract essence is retained. The abstraction may in- 
volve words, layout, typography, or even the feel of the pages. Are 
hardcover and paperback editions of a book equivalent? Does equiv- 
alence depend on whether the pagination is identical? Elsewhere, I 
have proposed canonicalization as a method of making such abstrac- 
tions precise (Lynch 1999). The fundamental point of canonicaliza- 
tion as an organizing principle is that it defines computational algo- 
rithms (called "canonicalizations") that can be used to extract the 
"essence" of documents according to various definitions of what 
constitutes that essence. If we have such computational procedures 
for extracting the essence of digital objects, we can then compare dig- 
ital objects through the prism of that definition of essence. We can 
also make assertions that involve abstract representations of this es- 
sence, rather than more specific (and presumably haphazard) repre- 
sentations that incorporate extraneous characteristics. 

The hard problem, of course, is precisely defining and achieving 
a consensus about the right canonicalization algorithm, or algo- 
rithms, for a given context. 

Integrity 

When we say that a digital object has "integrity," we mean that it has 
not been corrupted over time or in transit; in other words, that we 
have in hand the same set of sequences of bits that came into exist- 
ence when the object was created. The introduction of appropriate 
canonicalization algorithms allows us to consider the integrity of 
various abstractions of the object, rather than of the literal bits that 
make it up, and to operationalize this discussion of abstractions into 
equality of sets of sequences of bits produced by the canonicalization 
algorithm. 

When we seek to test the integrity of an object, however, we en- 
counter paradoxes and puzzles. One way to test integrity is to com- 
pare the object in hand with a copy that is known to be "true." 5 Yet, 
if we have a secure channel to a known true copy, we can simply 



5 As soon as we begin to speak of copies, however, we need to be very careful. 
Unless we know the location of the copy through some external (contextual) 
information, we run the risk of confusing authenticity and integrity. For example, 
if we have an object that includes a claim that "the identifier of this object is N" 
and we simply go looking for copies of objects with identifier N on a server that 
we trust, and then securely compare the object in hand with one of these copies, 
what we have really done is simply to trust the server to make statements about 
the assignment of the identifier N and then confirmed we had an accurate copy 
of the object with that identifier in hand. The key difference is between trusting 
the server to keep a true copy of an object in a known place and trusting the 
server to vouch for the assignment of an identifier to an object. 
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take a duplicate of the known true copy. We do not need to worry 
about the accuracy of the copy in hand, unless the point of the exer- 
cise is to ensure that the copy in hand is correct — for example, to de- 
tect an attempt at fraud, rather than to be sure that we have a correct 
copy. These are subtly different questions. 6 

If we do not have secure access to an independently maintained, 
known true copy of the object (or at least a digest surrogate), then 
our testing of integrity is limited to internal consistency checking. If 
the object is accompanied by an authenticated ("digitally signed") 
digest, we can check whether the object is consistent with the digest 
(and thus whether its integrity has been maintained) by recomputing 
the digest from the object in hand and then comparing it with the 
authenticated digest. But our confidence in the integrity of the object 
is only as good as our confidence in the authenticity and integrity of 
the digest. We have only changed the locus of the question to say 
that if the digest is authentic and accurate, then we can trust the in- 
tegrity of the object. Verifying integrity is no different from verifying 
the authenticity of a claim that "the correct message digest for this 
object is M" without assigning a name to the object. The linkage be- 
tween claim and object is done by association and context — by keep- 
ing the claim bound with the object, perhaps within the scope of a 
trusted processing system such as an object repository. 

In the digital environment, we also commonly encounter the is- 
sue of what might be termed "situational" integrity, i.e., the integrity 
of derivative works. Consider questions such as "Is this an accurate 
transcript?", "Is this a correct translation?", or "Is this the best possi- 
ble version given a specific set of constraints on display capability?" 
Here we are raising a pair of questions: one about the integrity of a 
base object, and another about the correctness of a computation or 
other transformation applied to the object. (To be comprehensive, we 
must also consider the integrity of the result of the computation or 
transformation after it has been produced). This usually boils down 
to trust in the source or provider of the computation or transforma- 
tion, and thus to a question of authentication of source or of validity, 
integrity, and correctness of code. 

Authenticity 

Validating authenticity entails verifying claims that are associated 
with an object — in effect, verifying that an object is indeed what it 



6 One thing that we can do with cryptographic technology — specifically digest 
algorithms — is to test whether two copies of an object are identical without 
actually exchanging the object. This is important in contexts where economics 
and intellectual property come into play. For example, a publisher that is offering 
copies of a digital document for license can also offer a verification service, where 
the holder of a copy of a digital object can verify its integrity without having to 
purchase access to a new copy. Or, two institutions, each of which holds a copy of 
a digital object but does not have to rights to share it with another institution, can 
verify that they hold the same object. Digest algorithms are also useful for 
efficiency purposes, because they avoid the need to transmit copies of what may 
be very large objects in order to test integrity. We should note that digest 
algorithms are probabilistic statements, however; the algorithms are designed to 
make it very unlikely that two different objects (particularly two similar but 
distinct documents) will have the same digest. 
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claims to be, or what it is claimed to be (by external metadata). For 
example, an object may claim to be created on a given date, to be au- 
thored by a specific person, or to be the object that corresponds with 
a name or identifier assigned by some organization. Some claims 
may be more mechanistic and indirect than others. For example, a 
claim that "This object was deposited in a given repository by an en- 
tity holding this public /private key pair at this time" might be used 
as evidence to support authorship or precedence in discovery. Typi- 
cally, claims are linked to an object in such a way that they include, 
at least implicitly, a verification of integrity of the object about which 
claims are made. Rather than simply speaking of the (implied) object 
accompanying the claim (under the assumption that the correct ob- 
ject will be kept with the claims, and that the object management en- 
vironment will ensure the integrity of the object) one may include a 
message digest (and any necessary information about canonicaliza- 
tion algorithms to be applied prior to computing the digest) as part 
of the metadata assertion that embodies the claim. 

It is important to note that tests of authenticity deal only with 
specific claims (for example, "did X author this document?") and not 
with open-ended inquiry ("Who wrote it?"). Validating the authen- 
ticity of an object is more limited than is an open-ended inquiry into 
its nature and provenance. 

There are two basic strategies for testing a claim. The first is to 
believe the claim because we can verify its integrity and authenticate 
its source, and because we choose to trust the source. In other words, 
we validate the claim that "A is the author of the object with digest 
X" by first verifying the integrity of the object relative to the claim 
(that it has digest X), and then by checking that the claim is authenti- 
cated (i.e., digitally signed) by a trusted entity (T). The heart of the 
problem is ensuring that we are certain who T really is, and that T 
really makes or warrants the claim. The second strategy is what we 
might call "independent verification" of the claim. For example, if 
there is a national author registry that we trust, we might verify that 
the data in the author registry are consistent with the claim of au- 
thorship. In both cases, however, validating a claim that is associated 
with an object ultimately means nothing more or less than making 
the decision to trust some entity that makes or warrants the claim. 

Several final points about authenticity merit attention. First, trust 
in the maker or warrantor of a claim is not necessarily binary; in the 
real world, we deal with levels of confidence or degrees of trust. Sec- 
ond, many claims may accompany an object; in evaluating different 
claims, we may assign them differing degrees of confidence or trust. 
Thus, it does not necessarily make sense to speak about checking the 
authenticity of an object as if it were a simple true-or-false test — a 
computation that produces a one or a zero. It may be more construc- 
tive to think about checking authenticity as a process of examining 
and assigning confidence to a collection of claims. Finally, claims 
may be interdependent. For example, an object may be accompanied 
by claims that "This is the object with identifier N," and "The object 
with identifier N was authored by A" (the second claim, of course, is 
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independent of the document itself, in some sense). Perhaps more 
interesting, in an archival context, would be claims that "This object 
was derived from the object with message digest M by a specific re- 
formatting process" and "The object with message digest M was au- 
thored by A." (See Lynch 1999 for a more detailed discussion of this 
case.) 

Comparing Integrity and Authenticity 

It is an interesting, and possibly surprising, conclusion that in the 
digital environment, tests of integrity can be viewed as just special 
cases and byproducts of evaluations of authenticity. Part of this 
comes from the perspective of the environment of "pervasive deceit" 
and the idea that checking integrity of an object means comparing it 
with some precisely identified and rigorously vetted "original ver- 
sion" or "authoritative copy." In fact, much of the checking for integ- 
rity in the physical world is not about ferreting out pervasive deceit 
and malice, but rather about accepting artifacts for roughly what 
they seem to be on face value and then looking for evidence of dam- 
age or corruption (i.e., torn-out pages or redacted text). For this kind 
of integrity checking, a message digest that accompanies a digital 
object as metadata serves as an effective mechanism to ensure that 
the object has not been damaged or corrupted. This is true even if the 
message digest is not supported by an elaborate signature chain and 
trust assessment, but only by a general level of confidence in the 
computational context in which the objects are being stored and 
transmitted. In the digital environment, there is a tendency to down- 
play the need for this kind of integrity checking in favor of stronger 
measures that combine authenticity claims with integrity checks. 

The Role of Copies 

David Levy argues that all digital objects are copies; this echoes the 
findings of the National Research Council Committee on Intellectual 
Property in the Emerging Information Infrastructure that use — read- 
ing, for example — implies the making of copies (National Research 
Council 2000). If we accept this view, authenticity can be viewed as 
an assessment that we make about something in the present — some- 
thing that we have in hand — relative to claims about the past (prede- 
cessor copies). The persistent question is whether a given object X 
has the same properties as object Y. There is no "original." This is 
particularly relevant when we are dealing with dynamic objects such 
as databases, where an economy of copies is meaningless. In such 
cases, there is no question of authenticity through comparison with 
other copies; there is only trust or lack of trust in the location and de- 
livery processes and, perhaps, in the archival custodial chain. 

Provenance 

The term provenance comes up often in discussions of authenticity 
and integrity. Provenance, broadly speaking, is documentation about 
the origin, characteristics, and history of an object; its chain of custo- 
dy; and its relationship to other objects. The final point is particularly 
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important. There are two ways to think about a digital object that is 
created by changing the format of an older object that has been vali- 
dated according to some specific canonicalization algorithm. We 
might think about a single object the provenance of which includes a 
particular transformation, or we might think about multiple objects 
that are related through provenance documentation. Thus, prove- 
nance is not simply metadata about an object — it can also be metadata 
that describe the relationships between objects. Because provenance 
also includes claims about objects, it is part of the authentication and 
trust infrastructures and frameworks. 

I do not believe that we have a clear understanding of (and sure- 
ly not consensus about) where provenance data should be main- 
tained in the digital environment, or by what agencies. Indeed, it is 
not clear to what extent the record of provenance exists independent- 
ly and permanently, as opposed to being assembled when needed 
from various pools of metadata that may be maintained by various 
systems in association with the digital objects that they manage. We 
also lack well-developed metadata element sets and interchange 
structures for documenting provenance. It seems possible that the 
Dublin Core, augmented by semantics for signing metadata asser- 
tions, might form a foundation for this, although attributes such as 
relationship would need to be extended to allow for very precise vo- 
cabularies to describe algorithmically based derivations of objects 
from other objects (or transformations of objects). We would proba- 
bly also need to incorporate metadata assertions that allow an entity 
to record claims such as "Object X is equivalent to object Y under ca- 
nonicalization C." 

Watermarks, Authenticity, and Integrity 

In the most general sense, watermarking can be viewed as an at- 
tempt to ensure that a set of claims is inseparably bound to a digital 
object and thus can be assumed to travel with the object; one does 
not have to trust transport and storage systems to correctly perform 
this function. The most common use of watermarks today is to help 
protect intellectual property by attaching a copyright claim (and pos- 
sibly an object-specific serial number to allow tracing of individual 
copies) to an object. Software exists to scan public Web sites for ob- 
jects that contain watermarks and to notify the rights holders about 
where these objects have been found. A serial number, if present, 
helps the rights holder not only identify the presence of a possibly 
illegal copy but also determine where it came from. Various trusted 
system-based architectures for the control of copyrighted works have 
also been proposed that use watermarking (for example, the Secure 
Digital Music Initiative [2000]). The idea is that devices will refuse to 
play, print, or otherwise process digital objects if the appropriate wa- 
termarks are not present.7 The desirable properties of watermarks 
include being very hard to remove computationally (at least without 
knowledge of the private key as well as the algorithm used to gener- 
ate the watermark) and being resilient under various alterations that 
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